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LONG-TERM  GOALS 

Develop  an  advanced  global  atmospheric  forecast  system  designed  to  exploit  massively  parallel 
processor  (MPP),  distributed  memory  computer  architectures.  Future  increases  in  computer  power 
from  MPP’s  will  allow  substantial  increases  in  model  resolution,  more  realistic  physical  processes,  and 
more  sophisticated  data  assimilation  methods,  all  of  which  will  improve  operational  numerical  weather 
predictions  and  provide  better  simulations  of  the  Earth’s  climate. 

OBJECTIVES 

The  current  Navy  operational  global  atmospheric  prediction  system  (NOGAPS  4.0)  is  a  highly 
optimized  Fortran  code  designed  to  run  on  parallel  vector,  shared  memory  machines  (CRAY’s).  The 
immediate  objective  of  the  project  is  to  redesign  the  model’s  numerical  algorithms  and  data  structures  to 
allow  efficient  execution  on  MPP  architectures  and  clusters  of  shared  memory  processors.  Message 
passing  (MPI)  is  the  paradigm  chosen  for  communication  between  distributed  memory  processors.  This 
work  is  support  by  ONR  Marine  Meteorology,  PE  0602435N  (035-71). 

APPROACH 

Use  integrations  of  the  current  operational  NOGAPS  as  control  runs  to  ensure  reproducibility  of  results 
with  the  newly  designed  Fortran  90  code.  Design  efficient  spectral  transform  algorithms  for  both  shared 
memory  and  distributed  memory  architectures.  For  distributed  memory  architectures  use  message 
passing  library  modules  in  communication  intensive  spectral  transforms  and  horizontal  interpolation 
routines. 

The  current  NOGAPS  spectral  formulation  requires  global  communication  for  the  spherical  harmonic 
transforms.  An  attractive  alternative  is  the  use  of  quasi-uniform  icosahedral  grids  based  on  local  basis 
functions  that  are  less  communication  intensive.  A  development  effort  on  this  next-generation 
NOGAPS  has  begun. 

WORK  COMPLETED 

The  complete  NOGAPS  spectral  forecast  model  has  been  ported  to  a  scalable  architecture  design  using 
MPI  as  the  communication  methodology.  The  code  has  been  run  successfully  on  the  Cray  T3E,  SGI 
Origin  2000,  DEC  8400  SMP,  and  Cray  C90.  The  computational  core  of  the  model,  including  the 
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solution  of  the  dynamical  equations  and  the  diabatic  processes,  scales  very  well  to  at  least  200 
processors.  The  I/O  and  communication  intensive  pre-processing  and  post-processing  parts  of  the 
model  do  not  scale  well,  and  are  candidates  for  separation  from  the  main  model  and  ported  to  shared 
memory  systems  more  suited  for  this  kind  of  computation.  Such  a  heterogeneous  computing 
environment  is  inevitable  for  large,  complex  production  codes  such  as  NOGAPS. 

A  shallow  water  version  of  the  icosahedral  grid  NOGAPS  has  been  completed  and  extensively  tested. 

RESULTS 

The  scalable  NOGAPS  MPI  code  has  been  run  extensively  on  the  T3E  at  a  variety  of  resolutions  and 
over  a  varying  number  of  processors  to  test  performance  and  robustness.  The  figure  below  shows 
results  for  a  T159L32  NOGAPS,  which  is  representative  of  the  current  operational  resolution,  and 
processor  numbers  from  15  to  240.  The  computational  core  of  the  model,  represented  by  ‘diabat’, 
‘dry_dyn’,  and  ‘MPI_trans’,  scale  reasonably  well  over  this  range.  The  ‘dry_dyn’  line  is  from  the 
dynamical  equations  and  shows  the  effects  of  varying  cache  reuse,  but  overall  is  almost  parallel  with  the 
perfect  scaling  line,  i.e.,  the  one  marked  ‘perfect’.  The  ‘diabat’  line  shows  excellent  scaling  to  60 
processors,  but  falls  off  above.  This  is  almost  entirely  due  to  severe  load  imbalances  in  the  convective 
parameterization,  which  is  concentrated  in  the  tropics. 

The  lines  ‘hist_writ’,  ‘dagnos’,  and  ‘p2sig’,  are  I/O,  diagnostics,  and  pre-processing,  respectively.  The 
show  total  lack  of  scaling.  For  the  first  two,  this  is  not  surprising,  since  they  are  largely  single  processor 
operations  that  potentially  can  be  improved  with  the  incorporation  of  the  yet  to  be  released  MPI-2 
standard.  The  pre-processing  step,  however,  is  clearly  a  problem  area.  The  cubic  spline  interpolations 
used  are  communication  intensive  and  scale  disastrously  above  60  processors.  NOGAPS  post¬ 
processing  shows  similar  properties.  Current  NOGAPS  pre-  and  post-processing  is  clearly  not 
appropriate  for  distributed  memory  architectures  and  will  require  alternate  strategies  in  future 
heterogeneous  computing  environments. 

The  icosahedral  grid  NOGAPS  has  been  used  to  evaluate  the  relative  merits  of  local  finite  element  and 
local  spectral  element  methods  on  these  kind  of  quasi-regular  grids.  A  number  of  papers  and 
presentations  on  the  results  have  been  published. 

IMPACT 

NOGAPS  is  run  operationally  by  FNMOC  and  is  the  heart  of  the  Navy’s  operational  weather  prediction 
support  to  nearly  all  DOD  users  worldwide.  It  is  also  run  by  many  NRF  and  other  Navy  researchers  to 
study  atmospheric  dynamics,  and  atmosphere/ocean  interaction.  Our  work  here  targets  the  next 
generation  of  this  system  for  the  next  generation  of  computer  architectures.  These  architectures  are 
expected  to  be  distributed  memory,  commodity  based  systems  with  enormous  theoretical  computational 
power.  However,  exploiting  this  capability  will  require  drastically  redesigning  many  important  model 
algorithms. 
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TRANSITIONS 


Improved  algorithms  for  model  processes  will  be  transitioned  to  6.4  (PE  0603207N)  as  they  are  ready, 
and  will  ultimately  be  transitioned  to  FNMOC  with  future  NOGAPS  upgrades.  Development  of  the 
MPI  NOGAPS  code  has  necessitated  close  examination  of  the  algorithms  used  in  the  operational  model, 
and  in  some  cases  uncovered  design  weaknesses  and  bugs  that  are  being  promptly  corrected  in  the 
operational  NOGAPS. 

The  scalable  NOGAPS  MPI  code  has  been  provided  to  FNMOC  as  the  flagship  benchmark  code  for 
their  planned  FY99  procurement  of  a  scalable  system  to  replace  the  current  operational  C90’s. 


T3ET159L32 


RELATED  PROJECTS 

(1)  NOGAPS  4.0  Evaluation  (X0513-01):  Advanced  development  and  transition  of  the  NOGAPS  4.0 
forecast  model  to  operational  status  at  FNMOC.  (2)  The  DOD  CHSSI  Scaled  Software  algorithm 
development  for  meteorological  models  (HPCM-96-032):  Development  of  numerical  algorithms 
appropriate  for  massively  parallel  computer  architectures.  These  algorithms  will  be  critical  for  inter¬ 
processor  communication  dependent  and  computationally  intensive  model  processes. 
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